Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

نویسندگان

  • Xiaoxiao Guo
  • Satinder P. Singh
  • Honglak Lee
  • Richard L. Lewis
  • Xiaoshi Wang
چکیده

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement learning with deep learning, called DQN, achieves the best realtime agents thus far. Planning-based approaches achieve far higher scores than the best model-free approaches, but they exploit information that is not available to human players, and they are orders of magnitude slower than needed for real-time play. Our main goal in this work is to build a better real-time Atari game playing agent than DQN. The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play. We proposed new agents based on this idea and show that they outperform DQN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for rewarddesign) for learning a reward-bonus function to improve UCT (a MCTS a...

متن کامل

Monte-Carlo Planning for Pathfinding in Real-Time Strategy Games

In this work, we explore two Monte-Carlo planning approaches: Upper Confidence Tree (UCT) and Rapidlyexploring Random Tree (RRT). These Monte-Carlo planning approaches are applied in a real-time strategy game for solving the path finding problem. The planners are evaluated using a grid-based representation of our game world. The results show that the UCT planner solves the path planning problem...

متن کامل

Real-Time Path Planning using a Simulation-Based Markov Decision Process

This paper introduces a novel path planning technique called MCRT which is aimed at non-deterministic, partially known, real-time domains populated with dynamically moving obstacles, such as might be found in a real-time strategy (RTS) game. The technique combines an efficient form of Monte-Carlo tree search with the randomized exploration capabilities of rapidly exploring random tree (RRT) pla...

متن کامل

Neurohex: A Deep Q-learning Hex Agent

DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents — e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods — raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after s...

متن کامل

Learning to Play Hearthstone Using Machine Learning

The subject of this thesis is a new game called Hearthstone. It is a strategy card game developed by Blizzard Entertainment, in which players duel with each other with cards they collected. The game of Hearthstone provides a challenge for developing an artificial intelligence (AI) agent. The agent has to be able to deal with unknown information and stochastic events in a large search space. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014